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Abstract 

This paper takes a computational learning theory approach to a problem of 
linear systems identification. It is assumed that input signals have only a finite 
number k of frequency components, and systems to be identified have dimension 
no greater than n. The main result establishes that the sample complexity needed 
for identification scales polynomially with n and logarithmically with k. 
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1 Introduction 

The problem of systems identification may be seen as an instance of the general question 
of "learning" an unknown function. Thus, one may ask if techniques from Computational 
Learning Theory ( "CLT" in what follows; see for instance |2(J for a general introduction, 



as well as the papers in the Special Issue [§]] of Systems and Control Letters) can be 
used to obtain insight into this central question in control theory. 

Indeed, Ljung asked precisely this question in a paper presented at a Special Session 
in Learning Theory organized at the 1996 Conference on Decision and Control, see • 
and, independently, the papers |7], || had already provided results along these lines for 



discrete-time linear systems on finite-window data. See also |pH , ||, and references 



there, as well as |12[] for several results for nonlinear systems in discrete time. 



For continuous-time systems, the situation is complicated by the fact that, even for 
finite-length inputs, learnability is impossible when formulated in the CLT framework; 
this was proved in |p3| , and, alternatively, can be seen by applying the discrete-time 



results (through sampling) from |7|, 

Thus, in this paper, we suppose that all inputs to be used, in the learning as well 
as validation stages, belong to the linear span of a fixed number k of sinusoidal basic 
functions. This band-limiting assumption allows us to obtain a precise result: the 
sample complexity needed for identification scales polynomially on an upper bound on 
the systems being identified, and logarithmically with k. This provides a tight analogy 
to the discrete results previously obtained, in which k appeared as the length of the 
discrete-time window employed. 
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The reader is referred to |L7| for results that apply to a class of nonlinear continuous- 
time systems, but which is formulated in terms of learning derivatives evaluated at a 
particular instant (as opposed to time data). 

This paper is organized in a top-down fashion. Definitions and main results are given 
in Section ^| and in Section |3| we state main upper and lower bounds for the complexity 
dimensions. After that we concentrate on proving the results; central techniques are 
discussed in Section |] and proofs are in Sections and ^. An example of a class with 
VC-dimension k is given in Section |7|. 



2 Definitions and Statements of Main Results 

In this section, we formulate the problem of system identification as a learning problem 
and illustrate the corresponding learning setting. We define the systems to be studied 
and, as main results, we state bounds for the number of samples needed in order to 
identify systems in this setting. 

We begin with a general introduction to classification problems. Assume that a 
set X, to be called the input space, is given together with a collection C of mappings 
X — > {0, 1}|] Let W be the set of all sequences 

W = (Ul, 0(«l)), • • • , (U s , 4>(u s j) 

over all s > 1, (ui, . . . , u s ) G X s and let <fi G C. An identifier is a map ip : W — *■ C. The 
value of ip on a sequence w above is denoted as ip w instead of ip(w). The "error" of ip 
is the probability that ip w iU misclassify a future sample. More formally, the error of ip 
with respect to a probability measure P on X , a <fi G C, and a sequence (ui, . . . , u s ) G X s , 
is 

Err(P, (p, u h ...,u s ):= P{u G X ; ip w (u) ^ (f>{u)}. 

The class C is said to be learnable if there is some identifier ip with the following 
property: For each accuracy parameter e > and confidence parameter 5 > there is 
some s so that, for every probability P and every (f) G C, 

P s {(u 1} . . . , u 8 ) G X s ■ Err(P, 0, Mi, ... , u s ) > e} < 5, 

where P s is the s-fold product of P. In the learnable case, the function s(e, 5) which 
provides the smallest s achieving this bound for any positive e and 5 is called the sample 
complexity. It can be proved that learnability is equivalent to the finiteness of the 
Vapnik-Chervonenkis dimension VG(C) of the class C, which is a combinatorial quantity 
describing the richness of the class C. Moreover, for learning algorithms that classify 
the observed samples correctly, the sample complexity is bounded by 

s(e, 6) < max <^ log 2 — , - log 2 



e oz \ 6 



1 The set X is assumed to be either countable or an Euclidean space, and the maps in C are assumed 
to be measurable. In addition, a mild regularity assumption called "permissibility" is needed so that 
all sets appearing below are measurable, for further discussion on the topic see an appendix in Jl5[ . In 
our context the measurability assumptions are satisfied. 
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In addition, there is a similar lower bound for the sample complexity. 

Classification may be viewed as the problem of identifying systems with binary 
outputs. More generally, we introduce a problem of identification for systems hav- 
ing bounded outputs ([0, l]-valued, for technical reasons) via an L 1 -error, following |fl 
(for similar statements with L 2 -error see 0). Denote by T a class of mappings from X 
to [0,1]. 

By definition, an identifier is a mapping from U sG n (X x [0, l]) s to [0, l] x . Such a 
map takes as data a sequence of labeled samples and produces an hypothesis. If h is a 
[0, l]-valued function defined on X and P is a probability measure over X x [0, 1], we 
define the error of h with respect to P as 



Er P (/i) := / \h(u)-y\dP(u,y). 

JXxlOA] 



IXx[Q,l] 

For e > and 5 > we say that an identifier ip (e, S) -learns in the agnostic sense with 
respect to T from s examples if, for all distributions P on X x [0, 1], and all / in T: 

P s {w ; Erp(^) > inf Er P (/) + e} < 5. 

Similarly, for e > the function class T is said to be e-agnostically learnable if there is 
a function sq : (0, 1) — >• N such that, for all < 5 < 1, there is an identifier ip which 
(e, <5)-learns in the above sense with s samples. In addition, if the identifier always 
chooses a hypothesis from J 7 , we say that T is properly e-agnostically learnable. 

For learning [0, l]-valued functions, a sample complexity result may be stated in 
terms of a fat-shattering dimension, which is a generalization of the VC-dimension. For 
e > and 6 > there is an identifier if) that properly (e, <5)-learns in the agnostic sense 
with respect to T from (0) 

6d, 7 / 336e , 7\ , 
In — — — — In — + In 



a 2 \ln2 a \ a 3 In 2 a J S 

=0 (\ (^log 2 i + log^ 

\cr \ a o 

samples, where < a < e/4 is chosen so that d = fat e /4_ a (.F) is finite. The quantity 
fat 7 (.F) is called the fat-shattering dimension of the class T and it measures the richness 
of the class T with scale 7. 

The sample complexity results show us that the difficulty of system identification 
in the learning theoretic setting can be analyzed by studying various complexity di- 
mensions, and it is the main focus of this paper. However, formal definitions of the 
complexity dimensions are delayed until Section |3|. 

2.1 Linear Systems 

In the context of learning we discuss continuous-time linear control systems: 

x = Ax + Bu, x(0)=x°, y = Cx, (2.1) 
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where A, B, and C are n x n, n x m, and p x n real matrices, and the time interval is 
[0,1]. We study sign-observations (see [IB]] for related work in control theory): 

sign y(l) = (sign ^(l), . . . , sign y p (l)) , 

where sign z = 0, if z < 0, sign z — 1, if 2 > and T stands for the transpose. For 
scalar observations this is a classification problem; each output is classified either or 
1 and the VC-dimension can be used to study the complexity of the problem. When 
p > 1, a generalization of the VC-dimension or a loss function is needed. 

In general, unlike the VC-dimension associated to discrete-time linear systems [17], 12" ], 



the VC-dimension of the classification problem for continuous-time control systems is 
unbounded P3] , even when n = 1, and the identification problem is not learnable in the 
sense discussed earlier. Therefore we restrict the class of admissible controls in order to 
achieve a bound for the VC-dimension. We consider controls u = (ui, . . . , u m ) such that 

u = Glu, 

where G is a m x k matrix that parameterizes the control. The set of basis input func- 
tions Q = {oj\, • • • , ujk} is fixed. The bounds for the VC-dimension or other complexity 
dimensions will depend on the properties of the set Q. 

For scalar inputs (i.e., m — 1) the VC-dimension associated to the mapping from 
inputs G to scalar sign-observations is bounded by k, which in fact can be very large in 
applications. This bound is tight; we give an example of a function class Q for which 
the associated VC-dimension is indeed k (see Section [TJ). However, by considering band- 
limited controls a better bound can be achieved. In this work we consider the following 
set of basis input functions 



yji, . . . , uJk ; u>i, . . . , u>k linearly independent and 



and let 



Uj = t ij e a ^ sm(f3jt) or = t e >e ajt cos(^-t) (2.2) 
with £j e N, aj, (3j G M, j = 1, . . . , 



max{4,...,4}. (2.3) 



The results in this paper hold with minor modifications if basis input functions 
ooj,j = l,...,k are, for example, linear combinations of the functions of the above 
form. However, the proofs and formulae are messier. 

Definition 1. (Sign system concept class, C m ^) Order the set of basis input func- 
tions Q and denote u = (u>i, . . . , u>k) T . Let 

X n = {Gu : [0, 1] -> R m ; G G R mk }, 

and for each linear system S = (A,B,C,x°) of dimension n define the mapping $s : 
Xq — ► W by $z(Gu) = 2/(1), where y(l) is the solution of S with control u = Geo. 
Similarly we define the mapping for sign-observations, 

5 S : X n -> {0, 1} P Gcu ^ sign($ s (Gcu)). 
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The class of above mappings is the sign system concept class, C m>p = 
{S"s ; S linear system of dimension n}. 

When studying the learning complexity associated to the classification problem we 
consider the time interval [0, 1] for simplicity, as the length of the interval plays no role 
in the result. However, for learning [0, l]-valued functions utilizing pseudo-dimension 
without a loss function, we consider the time interval [0, r] with r > 1. 

For the system 

x =Ax + Bu, (2.4) 

y =Cx, 

we can write Ce B = [71, ... , j m ], where each 7$ is a linear combination of n functions 
£1, . . . , £ n . Each £j is of the form t l e at sin(6t) or t e e at cos(fei) with £ G {0, . . . , n — 1} and 
a + ib an eigenvalue of A. Assume that A has a fixed Jordan block structure and let 
a k + ib k be an eigenvalue of A. We take an, . . . , a nm , ai, bi, . . . , a r , b r to be the system 
parameters, where %(t) = YTj=i a ij^j^) f° r * = 1, and 01, 61, . . . , a r , b r are n 

eigenvalue parameters. For example the eigenvalue parameters for a real 4x4 matrix A 
with eigenvalues ai ± b\i, a 2 and 03 would be a\, b\, a 2 and 03. Similarly, the eigenvalue 
parameters with purely complex eigenvalues a\ ± b\ i and a 2 ± b 2 i would be ai, &i, a 2 and 
62, whereas real eigenvalue parameters would be listed as a\, a 2 , 03 and 04. 

Let U C M mfc be a bounded set. Define a mapping F : A x U — > K such that 
F(X,G) = u(t), where r > 1 and i/(r) is a solution of ( |2.4|) with system parameters A, 
and initial condition x(0) = 0. 

In the following definition we take the final time to be r > 1 in order to show the 
effect of the time interval in the learning complexity. 

Definition 2. Assume that the system x = Ax + Bu, y = Cx, x(0) = can 
be parameterized by A G M n(m+1 ) as above and || A || 00 = maxi<j< n ( m+ i) A, < 1. 
Let F(X,u) = y(r) be the solution of ( |2.4| ) with system parameters A and control 
u = (iti, . . . , u m ) G U = {u = {ui, . . . , u m ) ; Jq Ui(t)dt < M, % = 1, . . . , m}. Denote 
B^(c) := {x G R k ; H^H^ < c} and define 

•F b = {F(X, ■) : U -> R ; A G i?™( m+1 )(l)}. 
2.2 Main Results 

We formulate two theorems about bounding sample complexities as main results. In 
Section |3] we summarize upper and lower bounds for complexity dimensions studied 
in this paper together with definitions. The following results, formulated as sample 
complexity statements, are immediate corollaries of learning complexity bounds proved 
in this paper, and hence no separate proofs are given. 

Theorem 2.1 (Sample complexity for concept learning). For sign systems con- 
cept class C m A with scalar observations, i.e., p = 1, the sample complexity s(e,8) for 
identifiers that agree with the observed sample can be bounded as 

f SVCjCrn^ [8e\ 4 (2\\ 
s{e, 6) < max <^ log 2 [—),- lo S2 ( $ j ? , 
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where 

VC{C m ,i) < 2(2mn 2 + An + 1) log 2 8e(8mn 2 k(n + £ max ) + l) (2nk + 2(1 + 2k) n ) 

and £ max is given by ( p.3| ) . 

In terms of n (the dimension of the state space) and k (the band-width) the upper 
bound for the VC-dimension is of the form 0(n 3 \og 2 (nk)). The next section states also 
a corresponding VC-dimension lower bound, in terms of the band-width, of the form 
0(log(A;)), and together with a lower bound for the sample complexity, this provides 
an estimate for the number of samples needed in learning. In particular, in a typical 
setting of fairly small system dimension n and large band- width k, the log A; bound is a 
clear improvement over the linear bound given by elementary analysis. 

Theorem 2.2 (Sample complexity for proper agnostic learning). Let k > 0, 

then the class Tb, given in Definition ^ is properly agnostically learnable from 



O (A Uat (1/4 - K ) e (^ B ) log 2 ~ + log ) 



samples, where 



^(i/±-K)e{FB) < min 



(m + l)nlog 2 



n 2 mT n e T kM 
(l/4-/c)e 



2(m + 4)nlog 2 (8e(nmk4(n + £ mSLX ) + l) (2nk + 2{2k + l) n ) 



together with £ max given by ( |2.2|) and (|2.3| ), and M a constant satisfying 

\ui(r - t)\dt < kM 



o 

for all i — 1, . . . , m. In above, \x\ stands for the integer part of x. 

In addition to the bound above the control systems can be parameterized linearly as 
J .r- 



follows: Let u{s) = Ylj=i 9j u j( s )- Then 



y{r) = C [ e A{T ~ s) Bu{s)ds = C [ e A{T - s) BS^_ g J to j (s)ds 
Jo Jo 

= E" =1 9i C f e A ^B^{s)ds = 9Mt)- (2-5) 

^ J ^ 



\j(T) 



Then by considering the class T := {g E M. k i— > Ylj=i9j^j( T ) > Ylj=i^j( T ) 2 = 1} an d 
further restricting g so that ||g|| 2 < R, [Tj| have shown that 



fat 7 (.F) < min{9R 2 /Y, k + + 



6 



3 Complexity dimensions; main upper and lower 
bounds 



We begin this section by denning various learning complexity dimensions and after that 
we summarize the main upper and lower bounds proved in this paper. 

Definition 3 (Vapnik-Chervonenkis dimension). The richness of the collection C 
can be measured by its Vapnik-Chervonenkis (VC) dimension introduced in A set 
S = {x\, . . . ,x n } C X is said to be shattered by C if, for every subset B C S, there 
exists a set A G C such that S H A — B. The Vapnik-Chervonenkis dimension of C, 
denoted VC(C), equals the largest integer n such that there exists a set of cardinality 
n that is shattered by C. For example, in IR fc the VC-dimension of closed half-spaces 
through the origin is k ||22j| . Thus, if VC(C) = d, C is not rich enough to distinguish 



all subsets of any d + 1 element set, but there is some d element set where subsets can 
be distinguished. Proving exact values of the VC-dimension is hard and typically one 
looks for upper and lower bounds for the VC-dimension, as is also done in this paper. 

For our purposes, it is more convenient to work with shattering in terms of di- 
chotomies, i.e., boolean- valued maps. We identify subsets of D with Boolean functions 
cf) : D —>■ {0, 1}. Similarly, each set C G C gives rise to a Boolean function on X, and 
intersections CC\D are restrictions of functions to D. In this language, a subset D C X 
is shattered by T := {<p ; <p : X —>■ {0, 1}}, if every dichotomy on D is a restriction to 
D of some <ft £ J 7 . 

Definition 4 (Pseudo-dimension with respect to a loss function). For a given 
class of functions T : X — > Y and a loss function L : Y x Y — > [0, r], we introduce 
for each f £ J 7 the function 

%:^xyxl^{0, 1}; (x, y, p) i-> sign(L(/(x), y) - p), (3.1) 

and let Ajr L denote all such A^l with / G JF. The pseudo-dimension of J 7 with respect 
to the loss function L, PD[jF, L], is defined as: 

PD[F,L] :=VC(A^ L ). 

The VC-dimension characterizes learnability of {0, l}-valued functions. For learn- 
ing real-valued functions we look for a generalization of the VC-dimension with similar 
properties. One such generalization is the pseudo- dimension. Unfortunately, pseudo- 
dimension does not share the property the VC-dimension has; there are learnable func- 



tion classes with infinite pseudo-dimension, see p0|, p. 206] and 0. 

Next we define the fat-shattering dimension that corresponds to shattering with 
fixed "margin" 7. Both the pseudo-dimension and the fat-shattering dimension can 
be used to bound certain covering numbers and in this sense they act like the VC- 
dimension. Moreover, the fat-shattering dimension gives upper and lower bounds for 
covering numbers of function classes and the finiteness of the fat-shattering dimension 
can characterize learnability (see M and H). 
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Definition 5 (Fat-shattering dimension). Let T be a set of real- valued functions. 
We say that a set of points X is 7- shattered by T if there are real numbers r x indexed 
by x G X such that for all binary vectors b x indexed by X, there is a function /& G 
satisfying 

> + 7, if fez = 1, and 
fb(x) < r x — 7, otherwise. 

The fat-shattering dimension fat 7 (.F) is a function from positive real numbers to integers 
which maps a value 7 to the size of the largest fat-shattered set, if it is finite, or infinity 
otherwise. 

The shattering dimension when the margin 7 equals is called the pseudo-dimension 
and it is denoted by PD(T). Clearly, for all 7 > 0, fat 7 (T) < PD(T). 



3.1 Bounds 

We begin by stating bounds in the easiest learning setting — classifying the final state 
observations as either or 1. 

Theorem 3.1. (VC-dimension upper bound, p = 1) The VC-dimension of the sign 
system concept class C m> i with scalar observations can be bounded as 

VC{C m ,i) < 2(2mn 2 + An + 1) log 2 8e(8mn 2 k(n + £ max ) + l) (2nk + 2(1 + 2k) n ) 

where £ max is given by 



In terms of n (the dimension of the state space) and k (the band-width) the upper 
bound is of the form 0(n 3 log 2 (n/c)). 

All VC-dimension upper bounds are based on the fact that input basis functions 
satisfy a certain rationality condition. Remark |5]3| indicates how the bound is formed 
when the input functions satisfy the more abstract rationality condition. In that case 
the degrees of the polynomials and the number of polynomial evaluations are different. 
However, in terms of n and k, the bound is of the same form. VC-dimension or pseudo- 
dimension bounds stated in this paper can be modified for the rationality condition in 
the same way. 

The lower bound for the VC-dimension is in terms of n and k. It holds for linearly 
independent continuous basis input functions and compared to upper bounds, no par- 
ticular form of the functions is needed. The bound is obtained by imposing a specific 
structure on control systems, and a lower bound for a restricted class of control systems 
provides a lower bound for more general classes. 

Theorem 3.2. (VC-dimension lower bound, m = 1, p = 1) 







k 






> max 1 m! 


log 2 










ml 







where ml = min{n, k}. 
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In terms of k the upper and the lower bound match up to a constant. For n and k the 
lower bound is typically of the form 0(nlog 2 {k/n)). Note that if the system dimension 
n is small compared to the band-width k the VC-dimension upper and lower bounds 
in Theorems |3.1| and |3.2| become tighter, both being of the form c log 2 k (with different 
values of the constant c). 

Extending the upper bounds to the case of vector-valued observations can be done 
in various ways based on the result obtained for scalar observations. For example, we 
may consider the p-dimensional output as bits representing a number in {0, . . . , 2 P — 1} 
and introduce a loss function for each / G C mtP as L _ij(z, a) = L _i(f(z), a) = 1, when 
f(z) ^ a, and otherwise. We define the VC-dimension of the p-dimensional observation 
as the VC-dimension of the above class of loss functions. Modifying the argument used 
with scalar observations leads to a bound of the following form: 

Theorem 3.3. (VC-dimension upper bound) 

VC{C m , p ) < 2{2pmn 2 + An + p) 

x log 2 8e(8mn 2 k(n + £ max ) + l) (2 P - 1 + 2p(2k + l) n + 2nk) 

where £ max is given by ( |2.3| ) . 

Next we state the main result concerning learnability of the actual input-output 
mapping, i.e., learning without taking the sign of the final state observation. 

Definition 6. (Control system concept class, Q p ,l) Let T = {$ s : Xq — > 

MP ; E linear system of dimension n} and define the control system concept class as 
g PiL =Af L , where A^ L is given by (gTTj). 

Methods for calculating upper bounds for the VC-dimension readily give a tool for 
obtaining upper bounds for the pseudo-dimension with respect to loss functions that 
preserve the rationality structure of the output. A typical example is illustrated by the 
loss function 

L{z u z 2 ) = (z! - ;? 2 )7(i + ( Zl - z 2 f) (3.2) 

and the following result: 

Theorem 3.4. (Pseudo-dimension upper bound, p = 1) 

PD{Q 1>L ) < 2 (2mn 2 + An + l) log 2 lQe{%mn 2 k{n + £ max ) + l) {2nk + 2(2A; + l) n ) 
where the loss function L is given by ( |3.2| ) and £ max is given by ( |2.3j ) . 



This differs from the corresponding VC-dimension bound only by the maximum 
degree of the polynomials, which is doubled. Extending this pseudo-dimension bound 
for p-dimensional observations can be done naturally by modifying the loss function. 
Lower bounds for the VC-dimension are lower bounds for the pseudo-dimension as such. 
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The next results summarize upper bounds for the fat-shattering dimension. We begin 
by illustrating how fat-shattering dimension can be bounded for Lipschitz functions in 
certain cases: 

Theorem 3.5 (Fat-shattering bound). Let F(X,u) : M. k x U — > M. be such that 
F(-,u) is Lipschitz with constant L, i.e., \F{\\,u) — F(X 2 ,u)\ < L\\Xi — X 2 \\ for all 
u G (/ . For any subset B C M fc ; consider the following class of functions: 



F B = {F(\,-):U 



A G B}. 



Then 



where 



faty^B* (Cf) ) < k\og 2 



CL 

7 



CL 

7 



fat 7 (^B^(c)) < fcl og 2 f 1 + 
fat 7 (JFg 21 ) <A:log 2 (c + ^j 



S* (C7) = {x G 1R 
£^(C) = {x G R fc 



max IxJ < C}, 

l<i<fc 

max \xA < C}, 



E 2 fc (C) = {* G M fc ; ||x|| 2 = x ' ^ C >- 



Theorem 3.6 (Fat-shattering bound for a control system). Assume that the 
system x = Ax + Bu, y = Cx, x(0) = can be parameterized by A G M n ( m+1 ) as in 
Definition^ and || A || 00 < 1. Let F(A, it) = y(r) be the solution with system parameters A 
and control u = (u\, . . . , u m ) G U = {u = (ui, . . . , u m ) ; JJ" Ui(t)dt < M, i — 1, . . . , m}. 
Define 

•F fl = {F(V): tf-K; AG 

T/ien 

fat 7 (jFe) < n(m + 1) log 2 



n mr n e T M 
7 



4 Techniques for Proving VC-Dimension Results 

Our main results are based on a fact that the basis input functions satisfy a certain 
rationality condition. In this section we first formulate this rationality condition and 
then we summarize existing results that are used in proving upper and lower bounds for 
the complexity dimensions. 
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We recall briefly the control system setting. We study systems 

x = Ax + Bu, x(0) = x°, y = Cx, 
u = Gu, and u>j G Q for j = 1, . . . , k, 

with basis input functions 

Q = |a;i, . . . , u>k ; ui, . . . , u>k linearly independent and 

u. = t^e^smipjt) or uj = fie** cos(/3,t) 
with £j G N, aij, Pj G R, j = 1, . . . , kj, 

such that 

Cax = max{£i, . . . ,4}. 

Definition 7. (Rationality condition (RAT)) Let n be a positive integer. We say 
that a bounded function uj : [0, 1] — > R satisfies the rationality condition relative to the 
class of n-dimensional systems if there exists h polynomial functions fx, . . . , fh '■ R 4 — ► R 
and 2^n rational functions r^g, i G {1, 2}, j G {1, . . . , 7} and £ G {1, . . . , n} with no 
poles on subsets S^g of R 4 , such that the following properties hold: 

1. For each i G {1, 2}, £ G {1, . . . , n}, R 4 is a disjoint union of S a g, . . . , S i g. 

2. Each ^ can be defined in terms of a Boolean expression involving [fx = 0], . . . , 
[A = 0], where we say that for functions fx, . . . , fh '■ R 4 — > R, [/i = 0] has value 1, 
if /i(xi, X2, 23, X4) = 0, and otherwise. 

3. Letting r a : R 4 -> R, i G {1, 2}, £ G {1, ... , n} be defined as 



then for each a, 6 G R, and for all £ G {1, . . . , n) 

I ft-i e at cos(bt)uj(t)dt = r 1£ -(a, b, e a cos b, e a sin 6) , 
1 

/ t l ~ x e at sin(bt)uj(t)dt = r 2i {a, b, e a cos b, e a sin b). 
Jo 

We denote by d max the maximum degree of any polynomial (i.e., fi, ■ ■ ■ , fh, numer- 
ators and denominators of r^g's) appearing in the rationality condition . 

Remark Jf..l. First, entries of e At are functions of the form t s e at cos(6t) and t s e at sin(fei). 
Solving (2.1) involves convolutions of e At and the basis input functions ujj, and we require 
those to be rational functions. 
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Example 4. 2. Let u(t) = sin(ct), with nonzero c. Then 

1 If 1 If 1 

e at sm(bt)u(t)dt = - e at cos((6 - c)t)dt - - I e at cos((6 + c)t)dt 
2 Jo 2 J 

after integration this can be split into cases with no poles yielding 

f p(q,fr,e a cos b,e a sin b) - r n / n e / n 

(a 2 +(b+c) . )(a 2 +(6 _ c) i ) , II Jl ^ U, J 2 ^ U, 

fe-smftcos^ if/ 1 = 0,/ 2 ^0, 

sinbcosb-fe if/ 1 ^0,/ 2 = 0, 



26 ' 



where 



/i(a, 6, e a sin&, e a cos6) = a 2 + (b + c) 2 , 
f 2 (a, b, e a sinfe, e a cos6) = a 2 + (b - c) 2 , 

and p(a, b, e a cos 6, e a sin b) stands for the polynomial 

— 4abc + 4abce a cos b cos c — 2a 2 ce a cos c sin 6 + 2b 2 ce a cos c sin 6 

— 2c 3 e a cos c sin 6 — 2a 2 be a cos 6 sin c — 26 3 e a cos 6 sin c + 2bc 2 e a cos 6 sin c 

+ 2a 3 e a sin b sin c + 2ab 2 e a sin 6 sin c + 2ac 2 e a sin 6 sin a 



Lemma 4.3. E'ac/i G f2 given &?/ ( |2.2|) satisfies the rationality condition. Further, 
the maximum degree of polynomials in (RAT) is at most 4(n + £ max ), where £ max is given 
by @- 



Review of VC-Dimension Techniques 

In the context of control theory it is sometimes easier to work with the dual VC- 
dimension. Assume that a function F : A x X — > {0, 1} is given. This induces two 
function classes 

F:={F(\,-):X^{0,1}; A G A} 

and 

jr* := {F(-,x) : A -> {0,1} ; 16 X}. 

The complexity dimension VC(JF*) is called the dual VC-dimension of T and it is related 
to VC(J r ) as follows [|C 



VC(F)> Llog 2 VC(^*)J, (4.1) 

where \x\ is the integer part of x. 

A sharper estimate can be obtained if A can be written as a product Ai X • • • X A n . 
The following construction and result are due to DasGupta and Sontag [0. We study 
in particular those dichotomies that are defined on "rectangular" subsets of A. Let 
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L = L\ x • • • x L n be a subset of A such that for each i, Li C Aj is nonempty. Given 
any index 1 < k < n a n-axis dichotomy on L is any function 5 : L — > {0, 1} which 
depends only on the /tth coordinate, i.e., there is some function : L K — > {0, 1} so 
that 5(Ai, . . . , A n ) = (f>(X K ) for all (Ai, . . . , A n ) G L. We say that a mapping is an axis 
dichotomy if it is a K-axis dichotomy for some k. A rectangular set L is said to be axis 
shattered by T* if every axis dichotomy is a restriction to L of some function of the form 
F(-, x) : A — ► {0, 1}, for some x G X. 

Theorem 4.4. (Axis shattering bound [|7]]) // L = L x x • • ■ x L n C A can 6e axis 
shattered and each L { has cardinality r, > 0, t/ien VC(F) > [log 2 (^i)J + • • • + |_log 2 (Vn)J ■ 

Upper bounds for VC-dimensions of concept classes that are obtained by evaluating 
polynomial equalities and inequalities can be obtained in terms of the number and 
degrees of the polynomials: 

Theorem 4.5. (Goldberg-Jerrum bound ||10|| ) Given a function F : AxA — > {0, 1} 

and the associated concept class T := {^(A, •) : X — > {0, 1} ; AG A} ; suppose that 
A = ~R e and X = Let F be defined in terms of a Boolean formula involving at 
most s polynomial equalities and inequalities in £ + k variables, each polynomial being 
of degree at most d in A for all x G Then, FC(JF) < 2£log 2 (8eds). 

The Goldberg-Jerrum bound is based on a result showing that the number of sign 
assignments { — 1, 0, 1} to polynomials can not grow too fast: 

Theorem 4.6. (ftlDJ) Suppose that fi, ■ ■ ■ , f m ore polynomials of degree at most d in 
n < m variables. Then the number of distinct vectors 

[sign AOr ),..., sign f m (x)] G {-1,0, l} m , 

that can be generated by varying x over MJ 1 , is at most ((8edm) /n) n . 



5 Proofs of the VC-dimension bounds 

5.1 An Upper Bound for the VC-Dimension with Scalar Ob- 
servations 



We begin this section by proving Lemma [4.3| stating that the input basis functions satisfy 



the rationality condition (RAT) and bounding the degrees of polynomials appearing in 
(RAT). As a proposition we formalize how control systems can be parameterized. After 
that, as a lemma, we develop an upper bound for the VC-dimension induced by the 
control system ( |2.1|) with its initial state fixed to be zero. Theorem [3.1| with an arbitrary 
initial condition is then a simple modification of the argument. 



Proof of Lemma [D| If u(t) = t e e at sm(pt) or u(t) = t e e at cos(/3t) with £ < £ max then 
in place of 



I t e e at sm(bt)u(t)dt or I t e e at cos(bt)u(t)dt (5.1) 
Jo Jo 
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by combining exponents and using sum formulae for sin and cos (see Example |4.2|) it is 
enough to study terms of the form 

-1 r i 



f t k e at sm(bt)dt or I t k e dt cos (bt)dt, 
Jo Jo 



t e e dt cos ((& - p)t) dt- J t l e u cos ((6 + (3)t) dt 
t l e u cos ((b + p)t)dt + [ t l e u cos ((6 - p)t)dt 



where k e {0, . . . , n + £ max — 1}- In fact, each expression in ( |5.1|) is one of the following 
type 

1 
2 
1 
2 
1 
2 
1 
2 



jT tV* sin ((& - P)t) dt - jf t*e a sin ((6 + /3)t) , 

/ f*e a *sin((6 + /3)t)dt + / t l e u sin {{b - P)t)dt 
Jo Jo 



where a = a + a. Because for k > 

•i . e s _ 

t k e at sin(bt)dt = ^- (a sin 6 - 6 cos 6) 

o a 2 + b 2 

k 



a 2 + b 2 Jo 



[ t*- 1 e a *(asin(6i) -bcos(bt))dt (5.2) 



and 



1 . /7 x , e a (asin& — 6 cos 6) + 6 „ s 

e at sm(bt)dt = — i = ^ (5.3) 

a 2 + 6 2 

and a similar formulae for t k e at cos(bt)dt hold, we see by induction that the numerator 

of J t k e &t sm(bt)dt is a polynomial of a, 6, e a cos b and e a sin 6. By using sum formulae 
for sin and cos, the previous expression is in turn a polynomial of a, b, e a cos b and e a sin b 
because a = a + a and b = b ± (3 for some fixed a and /3. By similar arguments, the 
denominator is a polynomial of a and b. Note that, for example, e a equals a constant 
times e a , so this process does not change the degrees of the polynomials. 

Further, observe that the denominator of j Q t e e at sin(bt)cu(t)dt consists of at most 
two products of variables a and b of the form ((a + a) 2 + (b ± (3) 2 ) e+e+1 , and similarly 
with the cos(fei) term. Let us index the basis input functions Ui, . . . ,ojf. so that uo K has 
parameters a K and (3 K . Hence the functions fi in (RAT), defining the subsets without 
poles, can be taken as 

{(a + a K ) 2 + (6 - (3 K )\ (a + a K ) 2 + (b + f3 K f ; re = 1, . . . , fc}. 

Furthermore, the sets S 1 ^ £1X6 cLS simple as 

U* =1 {{(xi,x 2 , x 3 , x A ) ; xi = -a K , x 2 = -/?«} U {(xi, x 2 , x 3 , x 4 ) ; x\ = -a K , x 2 = 
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and 

M 4 \ U* =1 {{(xi, x 2 , x 3 , x 4 ) ; x x = -a H , x 2 = -(3 K ) U 

{(xi, x 2 , x 3 , x 4 ) ; x x = -a K , x 2 = (3 K }} . 

We turn to estimating the maximum degree of polynomials appearing in (RAT). We 
already saw that functions fa are polynomials of degree 2. Equations ( |5.2| ) and (|5.3|) 
show that the degree of numerator is not higher than the one of denominator. We claim 
that 

W dt = ^mM for i = 0,1,... 



cos(fet) (a 2 + b 2 f+ l 



where P(2(k + 1)) stands for some polynomial in a, b, e a sin(6) and e a cos(6) of degree 
2(k + l). Clearly, the claim is true for k = by (|5.3|) and the inductive argument follows 
from ( |5.2|) . Assuming the claim true for k — 1, we get 



i 

k „at 



t k e at sin(bt)dt 



P[ ' 1] k< ' ^ t % - l e u sm(bt)dt + ^^ I t k ~ l e m cos(bt)dt 



~a 2 + b 2 a 2 + b 2 J a 2 + b 2 

P(2) P(l) P(2&) P(l) P(2k) 



~a 2 + 6 2 {a 2 + 6 2 ) (a 2 + b 2 ) k (a 2 + 6 2 ) (a 2 + b 2 ) k 

P(2)(fi 2 + 6 2 )^ - 2P(2k + 1) _ P(2(jfe + 1)) 
~ (a 2 + fe 2 )^ 1 ~ (S 2 + fe 2 )^ 1 

and similarly for the cos(bt) term, concluding the proof of the claim. As a corollary of 
the claim 

P(2(fc + 1)) F(2(fc + 1)) 



tV* sin(6t V(t)dt = P(2(fc + 1)) , + 
Jo (5 2 + (& + /3) 2 ) fe + 1 



2U+1 



( fo + /3 )2)fe+l (g2 + 

where k = £ + £ and a = a + a. 

Hence the maximum degree of denominators of expressions in ( |5.1|) is 2(k + l) + 2(k + 
1) = A(k + 1) with k G {0, . . . , £ max + n — 1}. Thus the maximum degree of polynomials 
appearing in the (RAT) is 4(n + £ max )- □ 

The next proposition indicates how control systems are parameterized and later the 
concept or function classes associated to control systems are obtained by varying the 
parameter vector. 

Proposition 5.1. Denote the basis input functions by u = (ux, ■ ■ ■ ,Uk) T and assume 
that each tOi, i — 1, . . . , k satisfies the rationality condition (RAT) and let A = M. 2pn m x 
M. An x M. p . Then there exists a mapping H : A x M mk — > MP (depending on u) such that 
for each £ = (A, B, C, x°) there exists a A G A satisfying 



$ s (Gcj) = H(X, G) for all G G 



r>mk 
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Proof. Given a system E = (A, B, C, x°), 

$ S ( M ) = y (l) = Ce A x° + C f e A(1 -^Bu(t)dt. 

Jo 

By an argument based on the real Jordan form of e At , the entries of e^ 1- ** 1 are linear com- 
binations of functions of the form t e e at cos(fet) and t e e at sin(fet), where t e {0, . . . , n — 1} 
and a + ib is an eigenvalue of A. Hence we define the following In functions 

£i(a,M) = e at cos(fa) 
£ 2 (a,M) = te a * cos(fet) 

£„(a,M) = t n ~V* cos(fa) 
£„+i(a,M) = e at sin(fa) 



6„(a,M) =r- 1 e a *sin(6t). 
By the rationality condition (RAT), for all I — 1, . . . , 2n 

f 1 „ , , x /s, Pei(a, b, e a cosfr, e a sin6) 

/ &(a, 6, t)uj(t)dt = / n ^ 

Jo Q^- (a, 6, e a cos 6, e a sin b) 

for all a, 6 G R and where Pej and are piecewise polynomial expressions. 
Let H(A, X, h, G) = (H 1 , . .., # P ) T , where for 1 < « < p 



if, 



i= i 2^ r=1 z^ =1 a - fe 2^ =1 ^ 



P(.ji^rli %r-2i %r3i ^ri) 
Qtji^rli %r2i %r3i %r<l) 



+ K 



and 



A — (aijr&J j=l,...,m 
r=l,...,n 
^=l,...,2n 

K=l,...,p 

h= (/i 1? ...,/i p ) T 



Next, we relate $ s and if and we write 



X — \X rr] )r=l,...,n 



Ce A{l - t] B 



In 



Ipi 



J?=l,...,4 



C — {9ij)i=l,...,m 
j=l,...,k 



7lr 



We list the eigenvalues of A as a r + i6 r for r = 1, . . . , n and let £ r e(t) = 0( a r, &r, £) for 
r = 1, . . . , n and £ = !,..., 2n. Then there exists some (ai r e K ) such that 



En ^— y 2n 



(5.4) 
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Let A = (A, X, h), where A satisfies (|5.4j) , X = (x rv ), where 
e Qr cos6 r and x r 4 = e ar sin6 r , and h = Ce A x°. We claim that 

H(X, G) = = $ s (Gcj) for all G G M mfc . 

Note that the K-th component of ^^(Guj) is given by 

/ V j Ki {t)ui(t)dt + h K 
Jo l ~ l 

Em r— \ n v~~ \2n f ^— v fc 

!=1 L r= iLf = i a!rfK / Zre{t)}^. =1 9ijUj{t)dt + h K 
1 r Jo 3 

Em ^ — v 2n /* 
> > a ir &> o y / 6(a r ,& n £)a; i (£)cJt + /i„ 
«=1 »r=l '«=! ^— 'j=l Jq 



m n 2n ^fc P ej (x rl ,X r2 ,X r3 ,X ri ) 

2^ =1 2^ r=1 2^ =1 2^ =1 aw g £j(Xri; Xr2j Xr3? Xr4) ^ 

tf«(A,X, h, G). 



□ 



Next we take p — 1 and study the VC-dimension of the sign system concept class, 
C TO ,i, where each control parameterized by G gives rise to signal). 

Lemma 5.2. Sign system concept class C m ^ with initial condition x(0) = satisfies 

VC{C m ,i) < 2 (2mn 2 + An) log 2 8e(8mn 2 k(n + £ max ) + l) (2nk + 2(1 + 2k) n ) 
where £ max is given by (|2.3|) . 



Proof. By Proposition |0] y(l) = H(X,G), where A G R 2mn x M 4n are considered as 
parameters. In fact, y(l) = ^, where P and Q denote piecewise polynomial functions. 
As in the statement of Goldberg- Jerrum bounds we have a function F : A x M. mk — > 
{0, 1} defined by F(X,G) = signH(X,G). The concept class associated to the system 
identification problem is T := {F(X, ■) : R mk -+ {0, 1} ; A G A}, where A = R2mn 2 +4n 
Before applying the Goldberg- Jerrum bound we need to determine the possible degrees 
of P and Q with respect to the parameters. 
The rationality condition implies that 

max \ deg(-Ry), deg(Q £j ) \ < d mBX . 

l<j<fc 

Then 

Qu ■ ?=1 Qtj 

so deg(Qj£) < fcc? max and deg(P^) < kd max . Note here that we are calculating the 
degree with respect to the system parameters, and the inputs g^ do not contribute. 
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By continuing in a similar fashion and combining r-summation to the ^-summation 
in Proposition |5TT| , we write Pi/Qi = Y^iZi a uPu/Qu to conclude that deg(Qj) < 



2n 2 deg(Qa) = 2n 2 kd max and deg(P;) < 2n 2 kd nmx + 1. Finally, P/Q = Y,T=i p i/Qi 
with deg(Q) < m2n 2 kd max and deg(P) < m2n 2 kd max + 1. 

Recall that with p — 1 and initial condition x(0) = 0, using the notation of Propo- 



sition 5.1 



i=l 'r=l '£=1 ^— 'j=l 



The proof of Lemma |4.3| indicates that the denominator of 

/ £i(x r i,x r2 ,t)uj(t)dt 
Jo 

equals 

((x rl + otjf + (x r2 + /?i) 2 )^ J ({x r i + «i) 2 + (2V2 - Pj) 2 ) Zej , 

where «j, j3j are fixed parameters of the basis input function ujj and G N. 

By carrying out the summations we get y(l) = P/Q, where Q consists of powers of 
polynomials f {jl , f ij2 with 

/iji(A, X, G) = (x a + atf + {x l2 + fa) 2 , 
f ij2 (A, X, G) = (x<i + aj) 2 + (x l2 - /3j) 2 , 

and i = 1, . . . ,n, j = 1, . . . , k. 

Our final step before applying the Goldberg-Jerrum bound is finding out the number 
of polynomial inequalities s needed in the Boolean formula evaluating the sign of the 
final state output. This is done by studying the number of different P/Q expressions 
without poles. 

An upper bound for different P/Q expressions without poles can be obtained by 
applying Theorem pLlj| to 2nk polynomials /yi, fij 2 , i = l,...,n and j = l,...,k 
viewing those as polynomials of 2n variables and each polynomial having degree 2. This 
gives the upper bound (16ek) 2n . 

However, a more specific bound can be obtained in this problem. Note that varying 
xn and Xi 2 we can make at most one of the 2k polynomials fyi, fij 2 , j = 1, . . . , k to 
be zero. For example, 7 zeros among fij 2 , i = 1, . . . , n and j = 1, . . . , k can be 
obtained in (2fc) 7 ( n ) ways and the number of possible sign assignments is obtained by 
summing over 7 yielding 



E; < 2 *) 7 (; i )=( i + 2 *r 



Thus the number of P/Q expressions without poles is (1 + 2k) n , which gives rise to 
2(1 + 2k) n polynomials. 

Note that in order to write signy(l) as a Boolean formula evaluating polynomial 
inequalities and equalities it also has to include the 2nk polynomials fyi, fij 2 , i = 
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1, . . . , n, j = 1, . . . , k. Values of these polynomials determine which P/Q expression is 
the valid one to determine signy(l). The Boolean formula for signal) can be given as a 
truth table involving polynomial inequalities of 2nk fyi, expressions and 2(1 + 2k) n 
different P and Q expressions. 

Using Lemma [4.3| for bound on d max , we apply the Goldberg- Jerrum bound with 
s = 2nk + 2{2k + l) n , d = m2n 2 kA{n + £ max ) + 1 and i = 2mn 2 + An. □ 

A simple example of a piecewise polynomial function P/Q together with the decision 
table for the final output is provided in Appendix |A]. 

Remark 5.3. The VC-dimension bound is modified for the more abstract rationality 
conditions as follows. Evaluating the sign of the output involves the evaluation of 
2(8ed max 2n 2 kh/An) 4n + 2n 2 kh polynomials; 2n 2 kh evaluations are needed to find an 
appropriate piece and by Theorem [4.6| the maximum number of possible expressions 
of the type P/Q is bounded by (8ec/ max 2n 2 /c/i/4n) 4 ™. Applying the Goldberg- Jerrum 
bound with s = 2(8ed max 2n 2 kh/An) 4n + 2n 2 kh, d = m2n 2 kd mSLX + 1 and £ = 2mn 2 + An 
gives the result. 



Proof of Theorem |3.1| , the VC-dimension upper bound, p = 1. By using the 
previous notation y = Ce A x° + C f e A< - 1_t - ) Bu{t)dt. Let x = Ce A x°. Then 
y = x + P/Q = (xQ + P)/Q = P/Q. This has 2mn 2 + An + 1 parameters and 
deg(P) < m2n 2 kd max + 1. □ 



Remark 5.4- Taking 2n 2 functions £, r e{t), t = 1, . . . , n, I = 1, . . . , 2n in Proposition |5.1| 
is clear overcounting. Further calculations indicate that 0(n Inn) functions £ r i(t) would 
be enough. For more details see Appendix [B| 

5.2 Lower Bounds for the VC-Dimension 

The lower bounds for the VC-dimension are developed for a single-input single-output 
system with initial state zero. The control is 



u 

'3 



* '1=1 



We derive lower bounds by fixing the structure of A, B, and C, and using the dual VC- 
dimension and axis shattering following the ideas of DasGupta and Sontag H . Lemmata 



x6| , |5.7| and |5.8| given in this section together prove Theorem [3.2| . These lower bounds 
are very general; we just assume that the input functions are continuous and linearly 
independent, thus no particular structure on input function is required as in the upper 
bounds. 

To make the next proof cleaner we formulate a part of it as a separate proposition. 
(The proposition is a standard fact but we include a short proof for the completeness of 
exposition.) 
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Proposition 5.5. Let Uj : [0, 1] — > R ; j = 1, . . . , k be continuous and linearly indepen- 
dent. Then the functions 



Uj(t)dt, j = l,...,k 



are linearly independent. 

Proof. We begin by showing that for any / G C[0, !] = {/: [0, 1] 



/ continuous} 



e M f{t)dt = for all A G 



implies f(t) = on [0, 1]. The above integral is an analytic function of A. Moreover, all 
derivatives in A are zero at A = 0, which gives that / t k f(t)dt = for all k = 1, 2, ... . 
As / is continuous and polynomials are dense in C[0, 1], we have that f(t) = 0. 

Linear independence of hi, . . . ,hk follows: If aihi(X) + ■ ■ ■ + akhk(X) = for all 
A G R, then 



e Ai (aiCJi(t) H h a k u k (t))dt = 0, VA G 



and by the above argument a.\UJi{t) + • • • + ctkUJkit) = for all t. As uji, . . . , were 
assumed to be linearly independent, we have that (ati, . . . , aj.) = (0, . . . , 0). □ 

Lemma 5.6 (Lower bound 1). Sign system concept class Ci,i with scalar inputs and 
scalar outputs satisfies 



VC(d i) > m! 





k 




log 2 


m' 





where m' = min{n, k}. 



Proof. Let u)j(t), j = 1, . . . , k be continuous and linearly independent. Let A have n 
distinct real eigenvalues — Ai, . . . , — A ra , and take B and C so that 



,Ait 



Ce A ^B = y 

^—^1=1 

where m' = min{n, k}. Then the final output of the system is 



y(l)= Ce A ^Bj2 =1 9^(t)dt = J2 i=i J2 =1 9i e^ 3 {t)dt. 

Jo ^ 1 3 J Q 

Define hj(\) = e xt ujj(t)dt. By Proposition ^.5| the h/s are linearly independent 
and we can find Ai, . . . , A& such that the matrix 

^i(Ai) ■•■ hk(\i) 



hi(\k) 



hk(^k) 
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has rank k. 

The control system with sign observations gives the mapping F : K'" x 

by 



{0,1} 



(\ 1 ,...,\ m/ ,g 1 ,...,g k ) ^ sign 



ETtX ^ — k 



We show that the mapping from parameters Ai, . . . , \ m > to {0, 1} can be axis shat- 
tered. Let L = {Ai, . . . , Afc} be so that [^(Aj)]^ has rank k. Denote by L 1; . . . , L m > 
disjoint subsets of L such that |Lj| = [k/m'\ and let M — L \ {IJI^i Li}. Next we want 



to interpolate in the points of L. 

Fix s, 1 < s < m' and let <fi : L s 
such that 



{0, 1} be any dichotomy. Next find g%, . . . , 



gjhj(X s ) = <f>(X s ), V\ s eL s , 
gjhj(X) = 0, V A G (L U M) \ L s 



(5.5) 



Let g%, . . . gl satisfy ( |5.5|) . (A unique solution exists because [hj(Xi)} has rank k.) Then 



F[X 1 ,...,X m/ ,gl,...,g* k ] = sign 



ETTL w ■> k 



0(A), 



when A 6 L s and for all (Ai, . . . , X m >) £ L 1 x ■ ■ ■ x L m >. 

Let T = {F(Xi,...,X m r,-) : R k -> {0,1} ; (Ai, ...,A m /) G M m '}. By the Axis 
shattering bound given in Theorem [O 



VC(.F) > m' 





fc 




log 2 













and thus VC(Ci 5 i) > VC(JF), where Ci^i is the control system concept class with p = 
m = 1. □ 



Lemma 5.7 (Lower bound 2). If k < n then 

VC(C 1A ) > k. 

Proof. We make a small modification of the above argument. Assume that k < n and 
let A have n real eigenvalues Ai, . . . , A n . Next we take B and C so that Ce A( - l ~^B = 
Y^h=i eXit Pi, where (Pi, . . . , /3 n , X±, . . . , X n ) are considered as system parameters. 
We study the mapping 



(Pi 



, fini X 



1) 



An, 9li 



<9k) 



sign E=i Y^ j=1 9M^i)Pi 



sign 
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Given (71, . . . , 7^), by linear independence of hi, . . . , h^ 



such that YTi=i hj{\i)fii = 7i> 



J 



k. But (71 



we can find Ai, . . . , A n , /3i, . . . , f3 n 
. , 7/c) can be viewed as a normal 



vector for a hyperplane through the origin in WL and the concept class associated to 



the mapping (gi, . . . 
Hence VC(&i) > k. 



9k) 



sign 



Ej=i as (7 



, 7^) varies has VC-dimension k. 

□ 



Lemma 5.8 (Lower bound 3). If n < k then 

VC(C hl ) > n. 

Proof. Our construction for the control system is as in the previous proof but now we 
assume that n < k and we study 

. . . , p n , Ai, . . . , A n , 5-1, . . . , g k ) 1 ► sign [j^ =1 =1 S'AMA] 



sign 



■ n L 1 9jhj(\) A = sign > ftA 



5/ 



and again by linear independence and the above hyperplane argument (now via first 
transforming [g l7 . . . ,gk)) we can conclude that the above mapping has VC-dimension 
n. Thus VC(Ci,:l) > n. □ 



5.3 VC-Dimension Upper Bounds for p-Dimensional Outputs 

We begin by proving Theorem |3.3| . 

Proof of the VC-dimension upper bound. We develop an upper bound based on 
the bound for a scalar sign-observation. We have seen that under the rationality as- 
sumption (RAT) the scalar output is a piecewise rational expression P/Q. In general, 
the control system maps G to (sign(Pi/Qi), . . . , sign(P p /Q p )) T , which is understood as 
a binary representation of a number in {0, 1, . . . , TP — 1}. Let / : M. mk —>■ {0, . . . , TP — 1} 
be the mapping given by the control system, and denote the class of all such mappings 
by T. For each / 6 T introduce a loss function L _ij(z,a) = L _i(f(z),a) = 1, when 
f(z) 7^ a, and otherwise. Define the class Lq-i,:f = j / £ F}- 

In order to calculate the value of the output, after determining an appropriate piece, 
one needs to know the truth values of the expressions Pi > 0, Qi > 0,...,P P > and 
Q p > 0, where P's and Q's are polynomials on inputs and parameters of the control 
system. To evaluate the value of the loss function Lq-ij(z, a), one needs the truth values 
ofj/ = 0,y = l,...,y = 2f-2. 

In the general case one needs Ink + 2p(2k + l) n + TP — 1 truth values. As this 
procedure evaluates only polynomials, we can use the Goldberg- Jerrum bound again. 
The maximum degree of the polynomials is m2n 2 fc4(n + £ max ) + 1, and the total number 
of parameters is 2pn 2 m + 4n + p, where the last term comes from the initial condition. 

□ 
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Remark 5. 9. An upper bound can be derived by using the uniform ^/-dimension defined 
by Ben-David et. al. 0. As before let / : R mk — > {0, . . . , 2 P — 1} be the mapping 
given by the control system. Let ^ = {ip 1 , . . . , ip p } be a collection of distinguishers 
ip 3 : {0, . . . ,2 P - 1} -> {0, 1} given by ^(sign^), . . . ,sign(y p )) = sign(%). 

Let ip j f : R mk -> {0, 1} be given by ^(«) = ^ j {f{z)) and ^ = ; / e -T 7 }. An 
upper bound for VC(^) is given by Theorem \!>.\\ Ben-David et. al. define the uniform 
^-dimension of J 7 , in which only one distinguisher is selected to shatter, as 

- dimuif) = max{VC(^) ; ip j G 

That is, the upper bound for the uniform ^/-dimension is just the upper bound obtained 
earlier for scalar observations. However, the uniform ^/-dimension can not be compared 
to the VC-dimension as such. For example, the formula for sample complexity with 
^/-dimension is different: to calculate the sample complexity the uniform ^/-dimension 
is multiplied by plog 2 (2 p — 1), see |6|, Theorem 27]. 



6 A Fat-Shattering Bound 



We begin this section by proving Theorems |3.5| and [3.6| . As a corollary of Theorem \TQ 
we prove the fat-shattering bound appearing in Theorem |2.2| bounding the sample com- 
plexity for proper agnostic learning. 



Proof of Theorem |3.5| . For the first part of the proof we use a generic set B for the 
parameters. Assume that we can 7-shatter a set of inputs {ui, . . . , Ud\ and there exists 
{ri, . . . , rd} such that, for each assignment b G {0, l} d , there exists a A G B such that 

F(A, Ui) > ri + 7, if hi = 1, and 
F(X, < Tj — 7, otherwise. 

We write A ~ /x if and only if the parameters A and // give the same assignment for all 
{«!, . . . , Ud}. Further, let A = {Ai, . . . , \ 2 d} be a collection of parameters that shatter 
{ui, . . . , Ud} and let Aj, Xj G A. Now Aj 7^ Xj implies that there exists u* G {ui, . . . , Ud} 
and r* G {ri, . . . , r^} such that -F(Aj, u*) > 7 + r* and F(Xj, u*) < 7 — r*, or vice versa. 
Hence 



27 < \F(Xi,u*) -FiX^u*)] < L\\Xi - Xj\\ 

and so ||A, — Aj|| > 27/L. That is, the set A of cardinality 2 d is a 27/L-separated set 
in B. Now the fat-shattering bounds follow by calculating 27/L-packing numbers for 
different sets B. 

If B = B^iC) the maximum possible cardinality for an e-separated set is [2C/e\ k , 
and thus 



2C 


k 


CL 






. 7 . 
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and solving for d yields d < klog 2 [_CL/^\. 

Similarly, if B = B^(C) the maximum possible cardinality for an e-separated 
set is (1+ [2C/e\) k and by a similar argument we arrive at the bound d < 
klog 2 (l+[LCM). 

For B = £?|(C), let P(e) be a collection of e-separated sets in B k (C) and let |P(e)| 
denote its cardinality. As all open balls with radius e/2 with centers at e-separated 
points have to be disjoint and their union has to be inside a ball of radius C + e/2, we 
get that \P(e)\a(k)(e/2) k < a(k)(C + e/2) fc , where a(k) = ir k / 2 /r(fc/2 + l) is the volume 
of a unit ball in R k . Hence |P(e)| < (C + 2/e) fc and 



2 d < (C + L/ 7 )* 



i.e., d < Hog 2 (C + Lji). 



□ 



Next we prove Theorem 3.6 by applying the Lipschitz bound to a control system. 



Proof of Theorem |3.6| . Our aim is to compute the Lipschitz constant associated to 
the control system in Definition ^| and then we apply Theorem |3.5| . 

Denote the system parameters (an, . . . , a nm , ai, &i, . . . , a r , b r ) by A and assume 
llAHoo < 1. Let 

F(\u)=y(r)= / > > aii£ e (t)Ui{T - t)dt. 
Jo % ~ 

Functions £i(t), . . . , £ n (£) are of the form £(t) = t c e at sin(6t) or £(i) = t c e a * cos(fei) where 
a + z6 is an eigenvalue of A and c G {0, . . . ,n — 1}. Thus taking a partial derivative 
with respect to a or b will increase the power of t by one and change the trigonometric 
functions. Therefore, 

Em ^ — -,n r T 
.^L/M / 6(*K(r-*)dt 

< ///// / |6(t)«i(r - t)| di < nmr n ~ l e T M 



dF(X,u) 




dy(r) 




da Kp 




da Kp 





because sup te m r i |&(t)| < e r r n and d(i, 
otherwise. Similarly we calculate 



dF(X,u) 




dy(r) 


da K 




da K 


dF(X,u) 




dy{r) 


db K 




db K 



as sup ie[0i . 



d K 



< e T r n and sup t 



6[0,' 



dtt(t) 



db K 



day/da^ = 1 if (i, 

< nmT n e T M and 

< nmT n e T M 
< e T r n . 



[K, p) and zero 



Now the Lipschitz constant can be taken to be L = n 2 me T T n M as 

\F(X,u)-F(X*,u)\ = \VF ■ (A - A*) | < L\\X - A*|U. 

The number of system parameters is at most nm + n = (m + l)n and we get the 
level fat-shattering bound by applying Theorem [3.5| with space dimension n(m + l) and 
L = n 2 me T T n M. □ 
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As a corollary, we combine the above result together with a pseudo-dimension bound 
to prove the fat-shattering bound given in Theorem I 



Corollary 6.1 (Fat-shattering bound in Theorem |2.2|) . Assume that the system 
x = Ax + Bu, y = Cx, x(0) = can be parameterized by A G R n ( m+1 ) as in Definition [| 
with ||A|| < 1 and assume in addition that the control is given by u = Guj where the 
input basis functions Uj are in Q given by ( |2.2|) . We denote the corresponding control 
system class by 



T B = {F{\-):U 



A G B}. 




n 2 m,T n e T kM 



(m + l)n log 2 

2(m + 4)nlog 2 (8e(nmkA{n + £ max ) + l) (2nk + 2(2k + l) n ) 



where £ max is given by ( |2.2|) and ( |2.3|) and M is a constant satisfying 

\ui(r - t)\dt < kM 

Jo 

for all i = 1, . . . , m. 

Proof. The first part of the bound follows from Theorem |3.6| with kM in place of M. 

The remaining part of the bound comes from the pseudo-dimension bound. First 
we derive the associated VC-dimension bound. As we assumed that A has a fixed 
Jordan block structure, every entry of e^ 1- ** 1 is a linear combination of n functions 
. . . ,Cn(t)- (That is, we don't need to consider all possible functions over different 
Jordan block structures.) This implies that in the Goldberg- Jerrum argument of Sec- 
tion we can take I = mn + 4n, d = nmk4(n + £ max ) + 1 and s = 2nk + 2(2k + l) n . 



Moreover, in that section the VC-dimension bounds were derived for time interval [0, 1]. 
However, the upper bound depends on the number of system parameters and the de- 
grees of polynomials to be evaluated. Changing the time interval to be [0, r] means just 
that we replace the eigenvalue parameters (referring to the proof of Proposition |5 . 1| ) 
a, b, e a cos b, e a sin b by ar, br, e aT cos br, e ar sin br. 

The above bound is also a bound for the pseudo-dimension. Observe that 
for Q = {g : X — > K.}, the pseudo-dimension can be defined as PD(^) = 
VC{Ind(x,y) = sign(g(x) — y) ; g G Q}. Hence we study the VC-dimension associ- 
ated to sign(y(r) — z) — sign(P/Q — z) — sign(P/Q), where P = P — zQ has the same 
degree as P with respect to the parameters. Here z is a new input, but the bound uti- 
lizing Goldberg- Jerrum technique does not depend on the dimension of the inputs, and 
hence the above VC-dimension bound is also a bound for the pseudo-dimension. (Note 
that here in the scale sensitive setting we do not apply the pseudo-dimension results of 



Section |5]3l using loss functions, as those rescaled the outputs.) □ 
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Remark 6.2. In the special case of scalar controls (i.e., m = 1) that are given by a 
linear combination of k fixed input functions, the control system class discussed can be 



viewed as a family of linear mappings in R as calculated in ( |2.5|) . Let T = {x G 



^2 i WiXi + 6 ; \\w\\2 = 1}. If we further restrict x such that \\x\\2 < -R and |0| < R then 
[ |X~8| , P^j have shown that 

fat 7 (T) < mm{9R 2 /-? 2 , k + 1} + 1. 

In this special case our 7-shattering result is of the form cilog 2 (|_ c 2^/7j)) where 
c\ and C2 are constants. This gives improvement over the hyperplane bound when the 
margin 7 is small. 



7 A Class of Systems with VC-Dimension k 

For the control system (|2.1| ) with scalar control u(t) = Y^i=i Qi^iif) an d unrestricted 
u>i, . . . ,Uk, the standard half-space argument gives an upper bound k. This bound is 
tight. We will give an example of a single- input, single-output one parameter family of 
control systems in dimension two that has VC-dimension k, when the controls are of 
the form u(t) = Yli=i 9i u i{t) an d <^i(l — t) = l[2-»,2-»+2«]> where a = —2(k + 1). 
Consider a control system 

x\ = x 2 

x\ = —X 2 Xi + u (7.1) 

y = -xi. 

For time interval [0,1] and initial condition (xi,x 2 ) = (0,0), the output is given by 
y (l) = ^ sin(At)u(l - t)dt. 

Lemma 7.1. Controls {u\, . . . ,u>k} such that Ui(l—t) = l[2-*,2-»+2 a ]; wherea = —2(k+ 
1), are shattered by the control system ( |7.1| ) with sign-observations. 



Proof. Let T = {2 l , i — 1, . . . , k} and JCT. Define Aj = it J^* =1 a>i2\ where dj = 1 if 
2~ l ^ J and = otherwise. Now if t = 2~ £ , then 

Xjt = n a,r- £ = n f g^f^ +^ + ZL + i a ^~' 

1/2 c 2 2ci 

where C\ G N and < c 2 < 2. 

Hence sin(Ajt) = sin(7r(l/2 c 2 + c^)). Note that i£a t = then l/2c 2 + a^ G [0, l-2" £ ] 
and if = 1 then 1/2 c 2 + G [1, 2 — 2~ £ ]. Thus sin(7r(l/2 c 2 + a^)) > if = and 
sin(7r(l/2c2 + a^)) < if an = 1. Therefore, 

sin(A jt) > = 

and 

sin(Ajt) > & t G J. 
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Further, 



I sm(Xjt)dt > <^> a e = 0, 



^ 2?T < 2" (fc+1) . (7.2) 



Xjt G [0, tt(1 - 2 - ' + 2 J 2 a )l C [0, ?r) , if a* = 

Z — '.7=1 



where a is taken so that 

This assures that when £ < k and t G [2~^, 2~^ + 2 a 
or similarly 

Xjt G [7r, 27r) , ii ae = 1. 

In O we can take a = -2(k + 1) as Y!l=i 23 = 2 fc+1 - 2. 
In this way the integrand in 

/ sin(Xjt)dt 
J2- e 

is either positive or negative. 

For S C {1, . . . , A;}, let J = {2~\ % G S}. For each u { , 

/ sin(AjtV;(l - t)dt = / sm(Xjt)dt > <^> i E S, 

Jo J 2~* 

i.e., the set of controls {ui, . . . , tUfc} is shattered by the mapping 

x\ i — / sin(Xjt)ui(l —t)dt . 



L </0 



□ 



A An Example of the Goldberg-Jerrum Bound 

We begin this appendix with an informal discussion on the Goldberg-Jerrum technique 
used to prove the VC-dimension upper bounds in this paper. 

We want to write y(l) = P/Q, where P and Q are polynomials. Unfortunately, the 
value of sign |/(1) can not be obtained by just evaluating P and Q since Q may have 
zeros. Therefore we need to write 



y(i) 



'Pi/Qi, if/^0,.,.,/^0 
P 7 /Q 7 , if/ 1 = 0,...,/^0 
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so that after evaluating \i polynomials /i, . . . , / M we can pick a definition Pj/Qi without 
poles in a region defined by the fi polynomials. When y(l) is defined in this way sign y(l) 
can be easily expressed by a Boolean formula evaluating 27 + /i polynomial inequalities 
and equalities. 

For simplicity we assume that p = 1 and the initial condition x(0) = 0. Then using 
the notation of Proposition [5J] we write 

Em r — \ii r — t,2n r — -^k f 
> > a irf > . g i:j / &(x rl ,x r2 ,t)uj j (t)dt 
i=l *— »r=l •«=! »j=l 



and by the proof of Lemma |4~3 
-1 

^(rr r i,a; r 2,t)wi(t)dt 



' ' ' J W ((x rl + «,-) 2 + (X r2 + ft) 2 )** ((^1 + «,) 2 + (*r2 - /3i) 2 ) 2 



where is some polynomial, Z£j G N and Wj(i) = t £j e ajt sm(/3jt) or ^-(t) 
t j e ait cos(Pjt). Hence the denominator of 



V\ , 9%j / &(avi : 
Jo 



X r 2, t)u)j{t)dt 



IS 



((x rl + aO 2 + (x r2 + ft) 2 ) 2 * 1 ((x rl + ftl ) 2 + (x r2 - ft) 2 ) 2 " x 

... x ( (Xrl + + (Xr2 + h yy^ ( (Xrl + ttfc)2 + {Xr2 _ p^H* 

By carrying out all summations y(l) = P/Q. The denominator Q consists of the 
product 

n" =1 ^{( X rl + «l) 2 + ( x r2 + A) 2 )* ((Xrl + «l) 2 + (x r 2 ~ Plf)* X 

• ■ • X ((X rl + a k ) 2 + (X r2 + Pk) 2 )* ((X r l + dk) 2 + {Xr2 ~ Pk) 2 )" 

where *'s stand for some unspecified powers. Hence the zeros of Q are determined by 
2nk polynomials 

fiji = {xn + oij) 2 + + Pj) 2 , 
fij2 = {xu + aj) 2 + (x i2 - Pj) 2 , 

and i — 1, . . . , n, j — 1, . . . , k. The number of different sign assignments determining 7 



is calculated as in the proof of Lemma 5.2 
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Example 



The purpose of the following example is to illustrate the function y = P/Q used in 
the Goldberg- Jerrum technique together with the sequence of polynomial evaluations 
involved and a table for the final output depending on the outcomes of the polynomial 
evaluations. 

Take m — 1, n — 2, k = 2, and assume that A has complex eigenvalues a ± ib. Take 
basis input functions to be u>i(t) = e* and uj 2 {t) = e 2t . Then 



where £i(t) = e at sin(bt), £ 2 (t) = e at cos(bt), a±, a 2 , a, b, e a sin& and e a cos6 are system 
parameters and g±, g 2 are input parameters. 
By using formulas 



we calculate the integrals appearing in the rationality condition, and we call them rn, 
ri2, r 2 i, and r 22 : 





and 



i 



I 



i 



r n , if (a + l) 2 + 6V 0, 
0, if (a + l) 2 + b 2 = 0, 



fi(tVi(t) 



e (a+1) * sin(fa) 




if (a + 2) 2 + b 2 ^ 0, 
if (a + 2) 2 + b 2 = 0, 



if (a + l) 2 + b 2 ^ 0, 
if (a + l) 2 + b 2 = 0, 

if (a + 2) 2 + £>V 0, 
if (a + 2) 2 + 6 2 = 0. 



The computation of 



sign y(l) = sign (^ =1 a * S j=1 & _/ o ^ 



is divided into three cases: 



• Case (a + l) 2 + b 2 ^ 0, (a + 2) 2 + &V 0: 



sign y(l) = sign(aic7irn + aig 2 ri 2 + a 2 gir 2 i + a 2 g 2 r 22 ) = sign 
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Case (a + l) 2 + b 2 = 0, (a + 2) 2 + &V 0: 

/ P 2 

signy(l) = sign(a 1 5f 2 ri2 + «2#i + "2^22) = sign I — 



Case (a + l) 2 + & 2 ^ 0, (a + 2) 2 + b 2 = 0: 

signy(l) = signO^irn + a 2 5'2^2i + 01292) = sign 



P3 

Q 3 



Thus we have three different expressions of the form — . 

Next we form the Boolean formula, F = signy(l), evaluating polynomials j\ = 
(a + l) 2 + 6 2 = 0, f 2 = (a + 2) 2 + b 2 = 0, > 0, Qi > 0, fori G {1,2,3}. In the following 
table 1 means true and means false for the above polynomial evaluation (** = 1 or 0, 
i.e., extend the table). 

fi = f 2 = P x > Qi > P 2 > Q 2 > P 3 > Q 3 > 






1 
1 











1 



1 
1 

** 
** 

** 



1 



** 
** 

** 



** 

1 
1 



** 

1 



** 



** 

** 
** 





** 

** 
** 





1 



1 





In this case (see the statement of Goldberg- Jerrum bounds) A 
{ai, a 2 , a, 6, e a cos b, e a sin 6}, X = g 2 }, s = 8, d = 12, and / = 6. 



B Functions Needed to Express an Exponential of 
a Real Matrix 

Taking 2n 2 functions £ r ^(£), r = 1, . . . , n, £ = 1, . . . , 2n to express e A * in Proposition |5.1| 
is clear overcounting. In this appendix we estimate how many functions are needed to 
express e At for any n x n real matrix A. 

Let us list the complex eigenvalues of A as ai ± i&i, . . . , a K ± where /t < |_ n /2_|, 
in the decreasing order by their geometric multiplicities. We introduce functions 

ai ± z6i : M ai±ibl := {e ait cos(M), te ai * cos(M), • • • , t L * J_1 e ait cos(M), 

e ai * sin(M), *e ait sin(M), . . . , tL^J-i e «i* s in(M)}, 

a 2 ± ife 2 : M a . 2±ib2 := {e a2 * cos(M), . . . , t L ?J"V 2t cos(M), 

e a2t sin(6 2 t), . . . , t^ 1 e a2t sin(6 2 t)}, 

a K ± i& K : M aK±ibK := {e a,t * cos(M), 

e aK *sin(6 K t)}- 
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Linear combinations of the above functions can express all possible matrices e when 
A has purely complex eigenvalues. 

As real eigenvalues can have higher geometric multiplicities, we add the following 
functions 

M ai±tbl := M ai±tbl U {fLSVi*, . . . ,r- 1 e ai *}, 
M a2±lb2 := M a2±zb2 U {t^e a *, . . . ,*LfJ-V 2 *}, 

M aK±jbK := M a(t±ift(t U {te a ^}. 

To express e A * when A has real eigenvalues ag, £ = k + 1, . . . ,n with geometric 
multiplicity one, we add at most [n/2\ + 1 functions M ae = {e a(t } for £ — k + 1, . . . , n. 
The total number of functions in 

M ai±ibl U • • • U M aK±lbK U M aK+1 U • • • U M an 

is 

2 £, rt L S J + E„ (L 7 J-L s J) + LjJ + i 

^ 3n v^L«/2j 1 n 

$ yL t + ¥ + 1 

3n / .n . \ . n . _ . . . 

" Y( lnL 2 J+1 ) + L 2 J+1 = ° (nlnn) ' 

using the fact that ln(fc + 1) < Ei=i f < ln 0) + L 
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